Shubhra/nvidia plugins #3392

Shubhrakanti · 2025-09-09T22:31:39Z

The NVIDIA Riva SDK provides synchronous APIs for both speech-to-text and text-to-speech operations. To prevent blocking the event loop:

STT: Runs _recognition_worker in a separate thread to handle synchronous streaming_response_generator calls, using call_soon_threadsafe to emit speech events back to the main event loop
TTS: Runs _synthesize_worker in a separate thread for synchronous synthesize_online calls, using call_soon_threadsafe to push generated audio chunks to the output emitter

Try it out with uv run examples/voice_agents/nvidia_test.py console

livekit-plugins/livekit-plugins-nvidia/livekit/plugins/nvidia/tts.py

riqiang-dp · 2025-09-18T00:47:39Z

livekit-plugins/livekit-plugins-nvidia/livekit/plugins/nvidia/tts.py

+
+        try:
+            await asyncio.gather(*tasks)
+            await asyncio.to_thread(synthesize_thread.join, timeout=5.0)


Hi, I just got a chance to test this, but this timeout is causing TTS to be cut off before finishing long sentences (>5s) Is this working properly for you? I set it to 20 and everything works as expected.

I think we can remove the timeout all together? The TTS can be arbitrarily long.

riqiang-dp · 2025-09-18T00:48:27Z

livekit-plugins/livekit-plugins-nvidia/livekit/plugins/nvidia/stt.py

+
+        self._auth = riva.client.Auth(
+            uri=stt._opts.server,
+            use_ssl=True,


it would be nice to have the option to set it to False (for local deployment and testing)

riqiang-dp · 2025-09-18T00:48:37Z

livekit-plugins/livekit-plugins-nvidia/livekit/plugins/nvidia/tts.py

+        if not self._tts_service:
+            auth = riva.client.Auth(
+                uri=self._opts.server,
+                use_ssl=True,


Shubhrakanti · 2025-10-08T19:18:33Z

@riqiang-dp let me know if there's anything else you'd like changed!

livekit-plugins/livekit-plugins-nvidia/livekit/plugins/nvidia/tts.py

livekit-plugins/livekit-plugins-nvidia/livekit/plugins/nvidia/stt.py

…plugins

livekit-plugins/livekit-plugins-nvidia/livekit/plugins/nvidia/stt.py

Shubhrakanti · 2025-10-17T16:42:33Z

livekit-plugins/livekit-plugins-nvidia/livekit/plugins/nvidia/stt.py

+            uri=stt._opts.server,
+            use_ssl=stt._opts.use_ssl,
+            metadata_args=[
+                ["authorization", f"Bearer {stt.nvidia_api_key}"],


If it's local host we can just omit this. (no need for api key)

If it's optional here https://github.com/nvidia-riva/python-clients/blob/f518bbfe722a9ffb1f20e8c677857bbce05ebdbd/scripts/asr/transcribe_mic.py#L48-L56 then we probably want it optional in LiveKit plugin too

Shubhrakanti

Need to think about these things. Let me know if you have any thoughts @theomonnom .

I should also add a comment on why use a separate thread - since it's different from the usual async implementation in plugins.

Shubhrakanti · 2025-10-17T16:44:08Z

livekit-plugins/livekit-plugins-nvidia/livekit/plugins/nvidia/stt.py

+            uri=stt._opts.server,
+            use_ssl=stt._opts.use_ssl,
+            metadata_args=[
+                ["authorization", f"Bearer {stt.nvidia_api_key}"],


If it's optional here https://github.com/nvidia-riva/python-clients/blob/f518bbfe722a9ffb1f20e8c677857bbce05ebdbd/scripts/asr/transcribe_mic.py#L48-L56 then we probably want it optional in LiveKit plugin too

livekit-plugins/livekit-plugins-nvidia/livekit/plugins/nvidia/stt.py

Shubhrakanti · 2025-10-24T16:07:47Z

livekit-plugins/livekit-plugins-nvidia/livekit/plugins/nvidia/stt.py

+            if self._thread_exception:
+                raise self._thread_exception


Is this a good idea? I was hoping this would trigger the fall back adapter.

Shubhrakanti · 2025-10-24T16:08:38Z

livekit-plugins/livekit-plugins-nvidia/livekit/plugins/nvidia/stt.py

+            except queue.Empty:
+                continue


I should add a comment on why this is here.

livekit-plugins/livekit-plugins-nvidia/livekit/plugins/nvidia/stt.py

Shubhrakanti · 2025-10-24T16:11:11Z

livekit-plugins/livekit-plugins-nvidia/livekit/plugins/nvidia/tts.py

+                service = self._tts._ensure_session()
+                while not self._shutdown_event.is_set():
+                    try:
+                        token = self._token_q.get(timeout=0.1)


Are there any performance issues here? Is there a better way to do this?

livekit-plugins/livekit-plugins-nvidia/livekit/plugins/nvidia/stt.py

livekit-plugins/livekit-plugins-nvidia/livekit/plugins/nvidia/tts.py

theomonnom · 2025-11-04T03:51:18Z

livekit-plugins/livekit-plugins-nvidia/livekit/plugins/nvidia/tts.py

+        logger.info("Available TTS voices:")
+        logger.info(json.dumps(tts_models, indent=4))


Shubhrakanti · 2025-11-04T17:41:25Z

@theomonnom I've updated the threading model and cleaned some some useless code should be good to go.

Shubhrakanti · 2025-11-04T17:52:08Z

examples/voice_agents/basic_agent.py

-from livekit.plugins import silero
+from livekit.plugins import cartesia, openai, silero
 from livekit.plugins.turn_detector.multilingual import MultilingualModel

 # uncomment to enable Krisp background voice/noise cancellation
 # from livekit.plugins import noise_cancellation

 logger = logging.getLogger("basic-agent")

 load_dotenv()


 class MyAgent(Agent):
    def __init__(self) -> None:
        super().__init__(
            instructions="Your name is Kelly. You would interact with users via voice."
            "with that in mind keep your responses concise and to the point."
            "do not use emojis, asterisks, markdown, or other special characters in your responses."
            "You are curious and friendly, and have a sense of humor."
            "you will speak english to the user",
        )

    async def on_enter(self):
        # when the agent is added to the session, it'll generate a reply
        # according to its instructions
        self.session.generate_reply()

    # all functions annotated with @function_tool will be passed to the LLM when this
    # agent is active
    @function_tool
    async def lookup_weather(
        self, context: RunContext, location: str, latitude: str, longitude: str
    ):
        """Called when the user asks for weather related information.
        Ensure the user's location (city or region) is provided.
        When given a location, please estimate the latitude and longitude of the location and
        do not ask the user for them.

        Args:
            location: The location they are asking for
            latitude: The latitude of the location, do not ask user for it
            longitude: The longitude of the location, do not ask user for it
        """

        logger.info(f"Looking up weather for {location}")

        return "sunny with a temperature of 70 degrees."


 def prewarm(proc: JobProcess):
    proc.userdata["vad"] = silero.VAD.load()


 async def entrypoint(ctx: JobContext):
    # each log entry will include these fields
    ctx.log_context_fields = {
        "room": ctx.room.name,
    }
    session = AgentSession(
        # Speech-to-text (STT) is your agent's ears, turning the user's speech into text that the LLM can understand
        # See all available models at https://docs.livekit.io/agents/models/stt/
-        stt="assemblyai/universal-streaming:en",
+        stt=cartesia.STT(),
        # A Large Language Model (LLM) is your agent's brain, processing user input and generating a response
        # See all available models at https://docs.livekit.io/agents/models/llm/
-        llm="openai/gpt-4.1-mini",
+        llm=openai.LLM(),
        # Text-to-speech (TTS) is your agent's voice, turning the LLM's text into speech that the user can hear
        # See all available models as well as voice selections at https://docs.livekit.io/agents/models/tts/
        tts="cartesia/sonic-2:9626c31c-bec5-4cca-baa8-f8ba9e84c8bc",
        # VAD and turn detection are used to determine when the user is speaking and when the agent should respond
        # See more at https://docs.livekit.io/agents/build/turns
        turn_detection=MultilingualModel(),
        vad=ctx.proc.userdata["vad"],
        # allow the LLM to generate a response while waiting for the end of turn
        # See more at https://docs.livekit.io/agents/build/audio/#preemptive-generation
        preemptive_generation=True,
        # sometimes background noise could interrupt the agent session, these are considered false positive interruptions
        # when it's detected, you may resume the agent's speech
        resume_false_interruption=True,
        false_interruption_timeout=1.0,
    )

    # log metrics as they are emitted, and total usage after session is over
    usage_collector = metrics.UsageCollector()

    @session.on("metrics_collected")
    def _on_metrics_collected(ev: MetricsCollectedEvent):
        metrics.log_metrics(ev.metrics)
        usage_collector.collect(ev.metrics)

    async def log_usage():
        summary = usage_collector.get_summary()
        logger.info(f"Usage: {summary}")

    # shutdown callbacks are triggered when the session is over
    ctx.add_shutdown_callback(log_usage)

    await session.start(
        agent=MyAgent(),
        room=ctx.room,
        room_input_options=RoomInputOptions(
            # uncomment to enable Krisp BVC noise cancellation
            # noise_cancellation=noise_cancellation.BVC(),
        ),
        room_output_options=RoomOutputOptions(transcription_enabled=True),
    )

+    await session.say("hello world")
+    session.shutdown()
+


Accidental commit. Will undo with #3797

Shubhrakanti added 5 commits September 9, 2025 13:28

init

51494b6

auth working

6d58241

working

31a78f2

save

a305294

save

146e0ca

github-advanced-security bot found potential problems Sep 10, 2025

View reviewed changes

livekit-plugins/livekit-plugins-nvidia/livekit/plugins/nvidia/tts.py Fixed Show fixed Hide fixed

tts authenticated

5f167e6

github-advanced-security bot found potential problems Sep 10, 2025

View reviewed changes

livekit-plugins/livekit-plugins-nvidia/livekit/plugins/nvidia/tts.py Fixed Show fixed Hide fixed

Shubhrakanti added 10 commits September 10, 2025 13:42

tokenizier working

19dc8cb

audio working

bd1ffd7

wokring

63d7b61

remove noisy logs

77454f4

ruff

c445fa8

clean up

ec36aa4

clean up

44421a4

clean up

2fc17e4

clean up

0aab274

init

97e4d1f

riqiang-dp reviewed Sep 18, 2025

View reviewed changes

Shubhrakanti added 4 commits October 8, 2025 11:39

Merge branch 'main' into shubhra/nvidia-plugins

abd7262

fix uv lock file

799ad22

remove timeout on TTS thread

0df5c2d

update ssl

b9381d4

Shubhrakanti requested a review from riqiang-dp October 8, 2025 18:51

chenosaurus reviewed Oct 15, 2025

View reviewed changes

livekit-plugins/livekit-plugins-nvidia/livekit/plugins/nvidia/tts.py Show resolved Hide resolved

livekit-plugins/livekit-plugins-nvidia/livekit/plugins/nvidia/stt.py Show resolved Hide resolved

Merge branch 'main' of github.com:livekit/agents into shubhra/nvidia-…

ac9378e

…plugins

Shubhrakanti commented Oct 17, 2025

View reviewed changes

livekit-plugins/livekit-plugins-nvidia/livekit/plugins/nvidia/stt.py Outdated Show resolved Hide resolved

Shubhrakanti commented Oct 17, 2025

View reviewed changes

update api key requirement when locally hoting nim

deb4299

Shubhrakanti added 2 commits October 22, 2025 09:50

update example

2eb6e2b

clean up

fa473b9

Shubhrakanti requested review from theomonnom and removed request for riqiang-dp October 23, 2025 20:32

Shubhrakanti commented Oct 24, 2025

View reviewed changes

Shubhrakanti added 11 commits November 3, 2025 17:31

Merge branch 'main' into shubhra/nvidia-plugins

ac88287

clean up tts

1713e9f

stt clean up

3974b95

clean up

71f2e5a

clean up

5d55593

clean up shutdown logic

380b0fa

shutdown logic

992d4f8

clean

0ab66bc

clean up tts logic

a5f2f36

clean up

44460cf

clean up

5fa13e8

theomonnom reviewed Nov 4, 2025

View reviewed changes

livekit-plugins/livekit-plugins-nvidia/livekit/plugins/nvidia/stt.py Outdated Show resolved Hide resolved

theomonnom reviewed Nov 4, 2025

View reviewed changes

livekit-plugins/livekit-plugins-nvidia/livekit/plugins/nvidia/stt.py Outdated Show resolved Hide resolved

theomonnom reviewed Nov 4, 2025

View reviewed changes

livekit-plugins/livekit-plugins-nvidia/livekit/plugins/nvidia/tts.py Outdated Show resolved Hide resolved

theomonnom reviewed Nov 4, 2025

View reviewed changes

Shubhrakanti added 2 commits November 4, 2025 09:39

clean up and pr comments

b5e76f8

private variable for done fut

e7ae856

Shubhrakanti requested a review from theomonnom November 4, 2025 17:41

theomonnom approved these changes Nov 4, 2025

View reviewed changes

Shubhrakanti merged commit 138c8b9 into main Nov 4, 2025
18 checks passed

Shubhrakanti deleted the shubhra/nvidia-plugins branch November 4, 2025 17:49

Shubhrakanti mentioned this pull request Nov 4, 2025

Undo basic_agent accidental commit in #3392 #3797

Merged

Shubhrakanti commented Nov 4, 2025

View reviewed changes

Shubhrakanti added a commit that referenced this pull request Nov 4, 2025

Undo basic_agent accidental commit in #3392 (#3797)

6a64e17

		logger.info("Available TTS voices:")
		logger.info(json.dumps(tts_models, indent=4))

Shubhra/nvidia plugins #3392

Shubhra/nvidia plugins #3392

Uh oh!

Conversation

Shubhrakanti commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

riqiang-dp Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Shubhrakanti commented Oct 8, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Shubhrakanti left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Shubhrakanti commented Nov 4, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Shubhrakanti commented Sep 9, 2025 •

edited

Loading

riqiang-dp Sep 18, 2025 •

edited

Loading